-
Notifications
You must be signed in to change notification settings - Fork 126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH, REF] Add uncombined support and modularize multi-file renamers #424
[ENH, REF] Add uncombined support and modularize multi-file renamers #424
Conversation
Codecov Report
@@ Coverage Diff @@
## master #424 +/- ##
==========================================
+ Coverage 74.99% 76.03% +1.04%
==========================================
Files 35 36 +1
Lines 2863 2946 +83
==========================================
+ Hits 2147 2240 +93
+ Misses 716 706 -10
Continue to review full report at Codecov.
|
Awesome, thanks!
Ideally, there should be a PR against BIDS spec introducing this, referencing existing BEPs which might already be taking advantage of it. E.g., I could see people suggesting to use Also, we do need to figure out how to test all the refactored/added functionality. I probably can collect some phantom data here, just need to know "how" and either we have representative sequences. |
I have bids-standard/bids-specification#370 open, but I'd be happy to open a pull request as well. I'm just not sure if it will be reviewed.
I don't have enough access to my institution's scanner to collect phantom data, but I can share the PDF for the SWI protocol we're using that has uncombined, complex, multi-echo data (all three of the functions I'm proposing here!). See attached (swi_acq-qsm_run-01.pdf). |
we don't have SWI sequences here... But now I recalled that I have some sample dataset with bunch of SWIs on phantom, I am checking if it is shareable. It might be a great use case, it is just too big :-( (3GB of DICOMs) so we might to prepare truncated version for any sensibly fast testing |
I can collect a phantom dataset with that protocol right now, if it helps. |
yes! |
I've checked -- I was wrong, was not done on a phantom :-( |
I got the dataset.
Then, I copied the data there. |
I've opened bids-standard/bids-specification#425, which uses |
AWESOME! The only thing done better could have been if you added Well, it is not required per se, but if you wanted to make your fork of velasco superdatset properly contain it -- that could be the way. Later though I would add it on datasets.datalad.org and you would have run into conflicts when you do
Run
Sorry about that. Let me guide you a bit. The best user-oriented documentation ATM is the http://handbook.datalad.org . For this particular aspect, see http://handbook.datalad.org/en/latest/basics/101-138-sharethirdparty.html#leveraging-third-party-infrastructure . Alternatives (to what is presented in the chapter as of today):
|
Second attempt to make the dataset available via Datalab:
At this point, I have a new GitHub repository with the tree structure of my dataset. The dcm files are just some symbolic links (to my |
oh, we totally stole this PR with all the datalad... heh, sorry about that @tsalo. Was there output from |
Sorry for the delay (it took a long while to upload all the 2K+ files to Box). (@yarikoptic , if it is OK, I can delete all my comments related to uploading the data, to clean up the PR.) |
box.com remote would be private I believe, i.e. not really accessible without users authenticating in box.com... found some old account there. Could you make sure annexed files are available to [email protected] ? I think you might need to add me to collaborators or something. |
would have probably made a nice <50M .tar.gz posted to figshare ;) |
I made the git-annex folder available to anybody with the link. I'm not sure if they force you to have a box.com account. Can you try with your
Yes, totally. |
but what is the link? ;) the underlying problem is that git annex only knows that
i.e that it is available from |
The link for the git-annex is: |
heudiconv/convert.py
Outdated
# Determine if data are complex (magnitude + phase) | ||
magnitude_found = any(['M' in it for it in image_types]) | ||
phase_found = any(['P' in it for it in image_types]) | ||
is_complex = magnitude_found and phase_found |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that this should address #421.
Woohoo -- FWIW, this branch (without current 0.7.0 master merged) is (more) correctly handling multi-echo PDT1w bundle (see #346 for more info/discussion): it properly adds _echo- entities to files$> heudiconv/utils/test-compare-two-versions.sh heudiconv{-master,-424} --bids -f reproin --files /home/yoh/datalad/dicoms/dartmouth-phantoms/bids_test6-PD+T2w/005-anat-PD+T2w_acq-tse-tra-3mm
Not running heudiconv since compare-versions/v0.7.0-1-g57c9c56 already exists
Running /home/yoh/picts/mris/heudiconv-424/venvs/dev3/bin/heudiconv with log in compare-versions/v0.6.0-20-g788bd96.log
Results differ: /home/yoh/picts/mris/compare-versions/v0.7.0-1-g57c9c56-v0.6.0-20-g788bd96.diff
.heudiconv/qa/info/dicominfo.tsv | 4
.heudiconv/qa/info/heuristic.py | 64
sub-qa/anat/sub-qa_acq-tseXtraX3mm_PD+T2w1.json | 114
sub-qa/anat/sub-qa_acq-tseXtraX3mm_PD+T2w1.nii.gz |34146 -------------------------------------------------------------------------------------------------------------------------------------------------------------------
sub-qa/anat/sub-qa_acq-tseXtraX3mm_PD+T2w2.json | 114
sub-qa/anat/sub-qa_acq-tseXtraX3mm_PD+T2w2.nii.gz |32279 ----------------------------------------------------------------------------------------------------------------------------------------------------------
sub-qa/anat/sub-qa_acq-tseXtraX3mm_echo-1_PD+T2w.json | 114
sub-qa/anat/sub-qa_acq-tseXtraX3mm_echo-1_PD+T2w.nii.gz |34146 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
sub-qa/anat/sub-qa_acq-tseXtraX3mm_echo-2_PD+T2w.json | 114
sub-qa/anat/sub-qa_acq-tseXtraX3mm_echo-2_PD+T2w.nii.gz |32279 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
sub-qa/sub-qa_scans.tsv | 4
11 files changed, 66674 insertions(+), 66704 deletions(-)
heudiconv/utils/test-compare-two-versions.sh heudiconv{-master,-424} --bids - 62.28s user 1.56s system 97% cpu 1:05.34 total
|
@pvelasco thanks again for sharing the data! Unfortunately I took a shortcut and didn't look into how to make peace with box.com special remote, so just downloaded "manually" and shared those files now from http://datasets.datalad.org/?dir=/dicoms/velasco/SWIdataset (AKA
and then people would be able to |
I have filed #435 which is the rebased version of this PR. Since our test coverage is not really covering those use cases - I didn't want to just force push. @tsalo - please review #435 and see if I've not screwed up anything, and then I will force push here. case 1: PD+T1w$> heudiconv/utils/test-compare-two-versions.sh heudiconv{-master,} --bids -f reproin --files /home/yoh/datalad/dicoms/dartmouth-phantoms/bids_test6-PD+T2w/005-anat-PD+T2w_acq-tse-tra-3mm
Running /home/yoh/picts/mris/heudiconv-master/venvs/dev3/bin/heudiconv with log in compare-versions/v0.7.0-24-g7d2c526.log
Running /home/yoh/picts/mris/heudiconv/venvs/dev3/bin/heudiconv with log in compare-versions/v0.7.0-34-g15abe59.log
Results differ: /home/yoh/picts/mris/compare-versions/v0.7.0-24-g7d2c526-v0.7.0-34-g15abe59.diff
anat/sub-qa_acq-tseXtraX3mm_PD+T2w1.json | 114
anat/sub-qa_acq-tseXtraX3mm_PD+T2w1.nii.gz |34146 --------------------------------------------------------------------------------------------------------------------------------------------------------------------------
anat/sub-qa_acq-tseXtraX3mm_PD+T2w2.json | 114
anat/sub-qa_acq-tseXtraX3mm_PD+T2w2.nii.gz |32279 ----------------------------------------------------------------------------------------------------------------------------------------------------------------
anat/sub-qa_acq-tseXtraX3mm_echo-1_PD+T2w.json | 114
anat/sub-qa_acq-tseXtraX3mm_echo-1_PD+T2w.nii.gz |34146 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
anat/sub-qa_acq-tseXtraX3mm_echo-2_PD+T2w.json | 114
anat/sub-qa_acq-tseXtraX3mm_echo-2_PD+T2w.nii.gz |32279 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
sub-qa_scans.tsv | 4 case 2: multi-echo with sbfref$> heudiconv/utils/test-compare-two-versions.sh heudiconv{-master,} --random-seed 1 --bids -f reproin --files dbic-misc/1075_sequence/sourcedata/sub-sid001567
Running /home/yoh/picts/mris/heudiconv-master/venvs/dev3/bin/heudiconv with log in compare-versions/v0.7.0-24-g7d2c526.log
Running /home/yoh/picts/mris/heudiconv/venvs/dev3/bin/heudiconv with log in compare-versions/v0.7.0-34-g15abe59.log
Results differ: /home/yoh/picts/mris/compare-versions/v0.7.0-24-g7d2c526-v0.7.0-34-g15abe59.diff
filegroup.json | 2144 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++------------------------------------------------------------------------------------------------------
1 file changed, 1072 insertions(+), 1072 deletions(-)
|
FWIW now there is a simplistic T1w uncombined: http://datasets.datalad.org/?dir=/dicoms/dartmouth-phantoms/ANAT-T1W_UNCOMBINED-20200330 and also a collection of sequences (EPI and DWI are multiband but no split channels) from philips: http://datasets.datalad.org/?dir=/dicoms/umass-philips/Zheng_Test_03302020 . I think I might see if any of those comes handy for establishing at least some basic testing |
And mag back to magnitude.
788bd96
to
a2d9099
Compare
FWIW, force pushed rebased on current master. |
@yarikoptic Is this PR waiting on anything (e.g., bids-standard/bids-specification#370, more testing, etc.) before it can be merged? |
yeah, both:
|
FWIW, test is IMHO more important ;) since per channel data is quite rare, I guess we could fix up later after merge. |
…ile-renamers # Conflicts: # heudiconv/convert.py
@yarikoptic I could excise the channel stuff and save it for another PR since it's built off of the modularization, if that makes things easier. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's proceed with current state and may be excise or tune up as a last resort?
may be only change _channel
to _ch
since that is what we seems converged in that bids spec PR.
I have left bunch of nit-picking pythonic comments, hopefully they would be taken as of value and not of pain ;) but the largest one is that we would (possibly later) need to RF it again to avoid now nicely modularized into functions code duplication ;) meanwhile we could at least make it a bit shorter and thus even easier to read. Also since now there are nice functions, would be nice to have unittest(s) - at least for one of them so later we could assure the same behavior.
heudiconv/convert.py
Outdated
channel_names.append(metadata.get('CoilString', None)) | ||
image_types.append(metadata.get('ImageType', None)) | ||
echo_times = [v for v in echo_times if v] | ||
echo_times = sorted(list(set(echo_times))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need for an explicit list()
here -- sorted would work fine on any iterator AFAIK
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so could be one line
echo_times = sorted(set(v for v in echo_times if v))
IMHO still readable.
heudiconv/convert.py
Outdated
|
||
# Determine if data are complex (magnitude + phase) | ||
magnitude_found = any(['M' in it for it in image_types]) | ||
phase_found = any(['P' in it for it in image_types]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor, also no need for an explicit []
list -- any('P' in it for it in image_types)
would be just fine since you would just get an iterator comprehension now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually since we do not care about those _found
variables and actually about retaining the list of lists for image_types
, and at the end just getting sets
of values:
- start with sets, not lists and either
.add
or.update
(for image_types) - would remove need for all the sorted and explicit set. - And then here it would just become a
is_multiecho = len(filter(bool, echo_times)) > 1
(oris_multiecho = len(v for v in echo_times if v)) > 1
if current form)
is_complex = 'M' in image_types and 'P' in image_types
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like it has to be is_multiecho = len(set(filter(bool, echo_times))) > 1
, but it's now in there and looks much cleaner.
heudiconv/convert.py
Outdated
if bids_file and is_uncombined: | ||
this_prefix_basename = update_uncombined_name( | ||
bids_meta, this_prefix_basename, channel_names | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not to have a single if bids_file:
above all of the possible cases? IMHO due to nice variable names comments aren't even needed ;)
this_prefix_basename = this_prefix_basename.replace( | ||
label, "_channel-%s%s" % (channel_number, label) | ||
) | ||
break |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
duplicate lines got me thinking -- wouldn't it be the same as
if (label == filetype) or (label in this_prefix_basename):
this_prefix_basename = this_prefix_basename.replace(
label, "_channel-%s%s" % (channel_number, label)
)
break
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and it feels like all 3 functions are largely duplicates of the same logic... ok -- I could look into refactoring it all later. As it came up in #441 we seems to not even have a convenience construct and/or helpers to decompose / recompose (in the standard order defined in BIDS entities table) a standard bids filename into key: value
pairs. I feel that we need a basic class such as BIDSFile
(possibly at large implemented by pybids probably but that one is too heavy ATM to depend on IMHO), which would have a simple dict interface for all the keys, and then additional attribute (e.g. here you call it .filetype
), and then __str__
would just return properly formatted filename. Then checks would become much easier etc. but ok -- we could do that later I guess
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a dedicated unit-test for at least one of these functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
- Some refactoring of name-updaters. - Add tests for name-updaters.
…s' into ref/modularize-multifile-renamers # Conflicts: # CHANGELOG.md # docs/conf.py # docs/installation.rst # docs/usage.rst # heudiconv/convert.py # heudiconv/info.py
Closes #423. I think that any other multi-file situations that operate similarly to multi-echo and uncombined data could also be dealt with in a similar manner.
Changes proposed:
_channel-<index>
entity (not currently in specification).